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ITERATIVE, MAXIMALLY PROBABLE, BATCH-MODE 
COMMERCIAL DETECTION FOR AUDIOVISUAL CONTENT 
Michael L. Harville 



BACKGROUND OF THE INVENTION 
5 1. Field of the Invention 

This invention relates to determining the starting and 
ending times of commercial breaks, as well as the starting 
and ending times of commercials within those commercial 
breaks, in audiovisual content (e.g., a television 
10 broadcast) . 

2. Related Art 

There are a variety of previous approaches to detecting 
commercials in a television broadcast. However, previous 
approaches suffer from a flaw in that they act as relatively 
15 simple finite state machines with little or no error 

recovery. Previous approaches sometimes make an erroneous 
decision regarding a commercial location which only becomes 
apparent by considering data far ahead or after the 
commercial location in time. However, since previous 
20 approaches don't consider such data, information regarding 
the erroneous decision is ignored and the error remains 
uncorrected. 

Merlino et al. of the MITRE corporation describe a 
multiple-cue-based method for segmenting news programs, 

25 including finding the commercial breaks. The Merlino et al. 
method uses black frames, audio silence and blank 
closed-captioning to find commercials. R. Lienhart et al. of 
the University of Mannheim also describe a multiple-cue-based 
system for detecting commercials. The Lienhart et al. method 

3 0 uses black frames, scene cuts and a measure of motion in a 
visual recording to detect commercials. The Informedia 
project at Carnegie-Mellon University used black frames, 
scene cuts and lapses in closed-captioning to detect 
commercials. Additionally, some VCRs come with commercial 
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detection built in to the VCR. There are also a number of 
patents that describe methods for commercial detection, all 
of which use coincidences of black frames, audio silence 
and/or certain closed-captioning signals to detect 
5 commercials. For example, U.S. Patent Nos . 4,319,286, 
4,750, 053, 4,390,904, 4,782,401 and 4,602,297 detect 
commercials based on these types of coincidences. 

All of the previous approaches to commercial detection 
select commercial start or end times as the times at which 
10 some combination of cues, such as a black frame and an audio 
pause, coincide, with some optional restrictions. Typically, 
in previous approaches the decision as to whether or not a 
commercial starts at a particular time "t" is independent of 
the analogous decision for any other time in the audiovisual 
15 broadcast or recording. Previous approaches in which such 
complete independence does not exist use only a very limited 
dependence in which the decision at time "t" may be affected 
by whether or not a commercial was thought to start within 
some window of time [t-n, t] prior to the time "t" . 

20 Thus, in previous approaches, a commercial detection decision 
made at any time "t" in a broadcast or recording is not 
affected by parts of the broadcast or recording following 
time "t" and only sometimes is affected by limited parts 
(less than one minute) of the broadcast or recording 
25 immediately prior to time "t" . Further, none of the previous 
approaches have any sort of double -checking or error 
recovery; once a decision is made for time "t", by whatever 
heuristic the approach uses, the decision remains unchanged 
no matter what happens in the broadcast or recording after 
3 0 time t and no matter what other decisions are made before 
or after time "t" . In summary, previous approaches to 
commercial detection make decisions as to commercial 
locations both time-locally and sequentially, i.e., only data 
from within a narrow time window about a particular time t is 
3 5 considered in making the decision as to whether a commercial 
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starts or ends at that time t, and the decisions are made one 
at a time and are never reversed* 



SUMMARY OF THE INVENTION 

In accordance with the invention, the starting and 
5 ending times of commercial breaks, as well as the starting 
and ending times of commercials within those commercial 
breaks, can be found in audiovisual content (e.g., a 
television broadcast) using a method having characteristics 
which overcome the above -described disadvantages of the prior 
10 art. The invention is implemented as a solution to a "batch 
optimization" problem in which commercial locations within a 
set of audiovisual content are detected as a group by 
choosing a set of commercial locations which optimizes a cost 
function which can include consideration of, for 
15 example, 1) one or more of many types of visual recording, 
audio recording and/or closed-captioning cues, 2) relative 
locations of commercials within the audiovisual content, 
and/or 3) probability models based on statistics obtained 
regarding characteristics of typical commercial and 
20 commercial breaks (e.g., commercial and commercial break 
duration, separation times of commercials and commercial 
breaks, likelihood of the presence of a commercial at any 
given time in a set of audiovisual content) . Optimization 
can be done over the total set of commercial location 
25 decisions, rather than on a per -commercial basis. 

Additionally, the cost function can be iteratively evaluated, 
increasing the accuracy of commercial location decisions 
produced by the method. The logic for making decisions 
regarding detection of commercials in accordance with the 
30 invention is orders of magnitude more sophisticated than that 
of other approaches to commercial detection and produces 
correspondingly superior results. Additionally, many more 
types of information (cues) can be used in detection of 
commercials in accordance with the invention than have been 
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used in other approaches to commercial detection. 

The invention can make use of any one of a number of 
particular novel collections of cues to enable commercial 
detection. Previous approaches to commercial detection have 
5 used some of the cues that can be used in commercial 

detection according to the invention, but have not used all 
of the cues or collections of cues that can be used in 
accordance with the invention. Additionally, cues other than 
those specifically described herein can be used to enable 

10 commercial detection according to the invention. 

Embodiments of the invention can have the following 
advantageous characteristics. The invention can be 
implemented so that commercial detection is performed 
iteratively. The invention can be implemented so that 

15 audiovisual content occurring before and/or after the time of 
a possible commercial is considered in deciding whether a 
commercial is present at that time. In particular, the 
invention can be implemented so that commercials are detected 
in a set of audiovisual content by considering all of that 

20 set of audiovisual content. The invention can be implemented 
so that the presence of multiple commercials in a set of 
audiovisual content can be detected at the same time. The 
invention can be implemented so that a decision regarding 
whether a commercial is detected at a particular location in 

25 a set of audiovisual content is dependent on the possible 

detection of one or more commercials at other location (s) in 
the set of audiovisual content. In particular, the invention 
can be implemented so that whether a commercial is detected 
at a particular location in a set of audiovisual content is 

3 0 dependent on all other possible detections of a commercial in 
the set of audiovisual content. Finally, as discussed above, 
the invention can be implemented so that commercial detection 
is based on the evaluation of a variety of novel cues or 
combinations of cues. Each of these aspects of the invention 

3 5 produce improved commercial detection results as compared to 
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previous approaches to commercial detection. 

The invention can be used for a wide variety of 
applications and purposes, as will be appreciated by those 
skilled in the art in view of the description herein. For 
5 example, commercial detection in accordance with the 
invention can be useful for enabling the observation of 
audiovisual content without commercial interruption f 
recording audiovisual content so that commercials are deleted 
from the recorded audiovisual content, searching for a 

10 particular commercial or a commercial of a particular type 
within audiovisual content, and customizing commercials in 
audiovisual content. Additionally, the invention can be used 
with, or implemented in, for example, conventional network 
television broadcasts, cable television broadcasts, 

15 television set-top boxes and digital VCRs . 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a flow chart of a method according to the 
invention. 

Figure 2 is a flow chart of a method according to a 
2 0 particular embodiment of the invention. 

Figure 3A is a graph of an example of a function P(t) 
which indicates the likelihood that a commercial starts or 
ends at each time during a set of audiovisual content. 

Figure 3B is a graph of an example of a function S(t) 
25 which represents a probability model of the likely location, 
relative to a particular commercial start or end time, of 
other commercial start or end times. 

Figure 3C is a graph of a function C(t) produced by 
convolving the function P(t) of Figure 3A with the function 
30 S (t) of Figure 3B. 
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Figure 3D is a graph of an example of a function R(t) 
which represents a probability model of the likelihood, at 
all times within a set of audiovisual content, that a 
commercial is in progress. 
5 Figure 3E is a graph of a function P* (t) produced by 

point-wise multiplying the function C(t) of Figure 3C by the 
function R(t) of Figure 3D. 

Figure 3F is a graph of an example of a function L(t) 
which represents a probability model of the typical duration 
10 of a commercial break. 

Figure 3G is a graph of an example of a function W(t) 
which represents a probability model of the typical time 
separation between commercial breaks in a set of audiovisual 
content . 

15 DETAILED DESCRIPTION OF THE INVENTION 

Figure 1 is a flow chart of a method 10 0 according to 
the invention for detecting one or more commercial breaks in 
a set of audiovisual content, each commercial break including 
one or more commercials. The method 10 0 identifies starting 

20 and ending times for each commercial break, as well as 

starting times for each commercial within a commercial break. 

In step 101 of the method 10 0, the data representing the 
audiovisual content (including closed-captioning or other 
transcription data, if applicable) is analyzed to identify 

25 the presence of one or more predetermined types of cues. The 
location and duration of each cue (i.e., the beginning and 
ending times of each cue) can be determined. Other 
characteristics of the cues can be determined as well. 
Examples of the types of cues that can be identified are 

30 discussed in more detail below. 

In step 102 of the method 100, the cues are analyzed to 
identify possible locations of commercial beginnings or 
endings (candidate times) within the audiovisual content. 
Examples of such analysis are discussed further below. A 



score is assigned to each candidate time, the score 
representing a probability that the candidate time is in fact 
a beginning or ending of a commercial. 

In step 103 of the method 100, the scores associated 
5 with each candidate time are adjusted. The score for a 
candidate time can be adjusted, for example, based on an 
analysis of one or more cues proximate to the candidate time 
that are different than the one or more cues used to identify 
the candidate time. For instance, as described in more 

10 detail below with respect to step 206 of the method 200 (see 
Figure 2) the score associated with a candidate time can be 
adjusted based on the presence or absence of one or more cues 
within a specified time window that includes the candidate 
time or to which the candidate time is sufficiently proximate 

15 (i.e., is less than a specified short amount of time, such as 
several seconds, before or after the time window). The score 
for a candidate time can also be adjusted, for example, based 
on an evaluation of the relationship between the candidate 
time and one or more other candidate times. In particular, 

20 as described in more detail below with respect to step 207 of 
the method 20 0 (see Figure 2) , this latter type of adjustment 
can make use of one or more probability models that describe 
expected relationship (s) between a candidate time and the one 
or more other candidate times. 

2 5 The invention can be implemented so that after the 

adjustment of scores in step 103, scores below a specified 
threshold are eliminated. However, this need not necessarily 
be done . 

In step 104 of the method 100, one or more commercial 
'30 breaks are constructed based on an evaluation of the adjusted 
scores of the candidate times and relationships among the 
candidate times. In particular, as described in more detail 
below with respect to step 208 of the method 200 (see 
Figure 2) , the evaluation of relationships among candidate 
35 times can make use of one or more probability models that 
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describe expected relationship (s) between candidate times. 

The method 100 can be implemented so that step 104 is 
iteratively performed. This can improve the identification 
of commercial breaks and individual commercials in a set of 
5 audiovisual content by enabling reconsideration of 

high- scoring candidate times which were not chosen as a 
commercial start or end time in an earlier performance of the 
step 104. 

Figure 2 is a flow chart of a method 200, according to a 
-10 particular embodiment of the invention, which can be 

implemented in a system, apparatus or computer program 
according to the invention to accomplish commercial detection 
in accordance with the invention. The data input to the 
method 200 represents audiovisual content (e.g., a television 

15 broadcast) . Herein, "audiovisual content" includes one or 
both of visual data and audio data, and can also include 
closed-captioning data. In some embodiments of the 
invention, the input data is stored on a data storage medium 
or media, such as, for example, a computer hard disk (DRAM or 

20 SRAM), a CD-ROM, a DVD disk or a VHS tape. As described 

below, in other embodiments of the invention, the input data 
can represent live (i.e., not stored) audiovisual content 
(e.g., a live television broadcast) that is acquired in real 
time. As shown in Figure 2, the input data is compressed, 
'25 however this need not necessarily be the case. 

The output of the method 200 is a list of commercial 
break start and end times within the audiovisual content, 
plus lists of the individual commercial start times (and, 
therefore, implicitly, the individual commercial end times) 

3 0 within each commercial break. The identification of the 
start and end times of commercial breaks and individual 
commercials in the audiovisual content can be used to, for 
example, enable editing of the audiovisual content. For 
instance, the detected commercials can be deleted from the 

3 5 audiovisual content or the commercials can be altered in a 
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desired manner. (The modified data, e.g., data representing 
commercial-less audiovisual content, can be stored on a data 
storage medium or media). However, a method according to the 
invention need not be used to edit the audiovisual content 
5 within which the commercial breaks and individual commercials 
have been detected. 

In the first step of the method 20 0, step 201, the input 
data is identified as the input data is input to the 
method 200 (e.g., read from a data storage medium or media). 
10 Apparatus for effecting such identification is known to those 
skilled in the art and will depend on the source of the input 
data. For example, readers for all useful data storage media 
are readily available. 

The next step of the method 200, step 202, is to 
15 decompress the raw data, if necessary. The invention does 
not require the original input data to be in compressed 
format, nor does the invention require the input data to be 
in uncompressed format. However, if the input data is in 
compressed format, the implementation of the invention 
-20 illustrated in Figure 2 requires a decompression mechanism. 
The precise form of the decompression mechanism depends on 
the compression format, but decompression mechanisms for all 
useful forms of compression formats are readily available. 

In step 203 of the method 200, the decompressed data is 
25 split into visual, audio and closed-captioning subcomponents. 
(As indicated above, one or two of visual, audio and closed- 
captioning data may not be present as part of the input 
data.) The precise form of the mechanism for splitting apart 
the input data can depend on the decompression mechanism 
30 used, but such data-splitting mechanisms are readily 
available for any useful compression format. 

In step 204, the audiovisual content is evaluated to 
identify the presence of one or more cues in the audiovisual 
content. Each data subcomponent produced by step 203 is 
-"3 5 input into one or more analyzers. The analyzer (s) identify 
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the location (s) and duration (s) of cue(s) within the 
audiovisual content. In particular, the presence of cue(s) 
throughout an entire set of audiovisual content can be 
identified. The analzyers may also identify other 
5 characteristics of the cue(s). The following is an exemplary 
list of characteristics of a set of audiovisual content 
regarding which cues can be identified within the set of 
audiovisual content: 1) an audio pause (i.e., a period of 
silence or near silence) in the audio content, 2) a sequence 
10 of black frames in the visual content, 3) a scene cut or fade 
in the visual content, 4) a significant (i.e., greater than a 
specified amount) change in average volume in the audio 
content, 5) the presence of music in the audio 
content, 6) speaker identity, 7) the "density" of scene 
-15 breaks (cuts) or fades in the visual content (i.e., the 

number of scene breaks and/or scene fades during a specified 
time window divided by the duration of the time 
window) , 8) the absence of a usually present network icon 
(whose shape and color characteristics can be learned 
20 automatically by appropriately analyzing a region, e.g., a 
region, such as a corner, near the edge of the visual 
content, of the visual content in which a network icon is 
expected to be present, 9) the degree of motion in a period 
of visual content, 10) the presence of text in the visual 
25 content, 11) the occurrence of specified closed-captioning 
formatting signals, and 12) the absence of closed-captioning. 
Suitable methods for identifying each of the above-listed 
cues have been published in various academic journals, 
industry journals and conference proceedings, and are known 
30 to those skilled in the art. As will be appreciated by those 
skilled in the art in view of the description herein, other 
cues not listed above can also be used, alone or in 
combination with each other and/or one or more of the above - 
listed cues to enable commercial detection according to the 
35 invention. The specific methods used to identify the above- 
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listed or other cues may affect the overall performance of a 
system or method according to the invention, but, in general, 
any such methods can be used with the invention. However, it 
is an advantageous aspect of the invention that the invention 
5 enables use of a combination of the above-listed cues to 
effect commercial detection* 

In an alternative implementation of the invention, the 
output from step 201 of the method 200 would be input 
directly to some or all of the analysis engines of step 204. 

.10 That is, some of the analysis engines can be made to operate 
directly on the compressed data, depending on the compression 
format. For example, the black- frame detection, scene-cut 
detection, motion analysis and audio-level change detection 
can all operate directly on data that has been compressed in 
15 the MPEG-1 or MPEG-2 format. 

In step 205 of the method 200, one or more of the cues 
identified in step 204 are analyzed to identify candidate 
times within the audiovisual content at which a commercial 
beginning or a commercial ending may occur. For example, an 
20 audio pause often accompanies either the beginning or the end 
of a commercial, so the presence of an audio pause in the 
audio content can be identified as a factor that militates 
toward establishing a candidate time at some time during or 
proximate to the audio pause. Similarly, a sequence of black 

-25 frames often accompanies either the beginning or the end of a 
commercial, so the presence of a sequence of black frames in 
the visual content can be identified as a factor that 
militates toward establishing a candidate time at some time 
during or proximate to the sequence of black frames. A scene 
3 0 cut or fade also typically accompanies the beginning or the 
end of a commercial, so the presence of a scene break or fade 
in the visual content can be identified as a factor that 
militates toward establishing a candidate time at some time 
during or proximate to the scene break or fade. The 
3 5 beginning and end of a commercial break are often accompanied 
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by a noticeable increase and decrease in volume, 
respectively, so that a significant change in average volume 
(measured over a specified window of time) can be identified 
as a factor that militates toward establishing a candidate 
5 time at some time proximate to times at which the volume is 
seen to change significantly. Commercials often include 
relatively more musical content than the rest of a set of 
audiovisual content, so the occurrence of a time window of 
specified duration (e.g., the expected duration of a typical 

10 commercial break, such as 60 seconds, or the expected 

duration of a typical commercial, such as 15 or 30 seconds) 
having relatively high musical content (e.g., relatively high 
density of musical content relative to the density of musical 
content in other, proximate time windows) can be identified 

15 as a factor that militates toward establishing candidate 
times at the beginning and end of such a time window. The 
beginning or end of a commercial is often accompanied by a 
change in speaker identity, so the occurrence of a change in 
speaker identity can be identified as a factor that militates 

20 toward establishing a candidate time at, or proximate to, the 
time at which such a change in speaker identity occurs. A 
commercial break often includes a relatively high density of 
scene breaks and/or fades (since a scene break or fade 
typically occurs at the beginning and end of a commercial 

25 break, as well as at the transition between commercials 
within a commercial break, and since commercials often 
include a relatively large number of scene breaks and/or 
fades per unit time within the commercial) , so the occurrence 
of a time window of a specified duration (e.g., 60 seconds) 

3 0 during which the density of scene breaks and/or scene fades 
is relatively high (i.e., exceeds a specified threshold), or 
a significant change in density of scene breaks and/or scene 
fades over one window of time with respect to a proximate 
window of time, can be identified as a factor that militates 

-35 toward establishing candidate times at the beginning and end 
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of such a time window. A network icon is sometimes present 
during the noncommercial parts of a television broadcast; 
therefore, if a network icon is determined to be present in a 
set of audiovisual content, the disappearance of the network 
5 icon typically accompanies the beginning of a commercial 
break and the appearance of the network icon typically 
accompanies the end of a commercial break, so the appearance 
or disappearance of a network icon can be identified as a 
factor that militates toward establishing a candidate time 

10 at, or proximate to, a time at which the network icon appears 
or disappears. Since the average motion level in the visual 
content of a commercial is often significantly different than 
the average motion level of other visual content in a set of 
audiovisual content, significant change in the amount of 

15 motion in the visual content of a time window (e.g., about 60 
seconds) relative to the amount of motion in the visual 
content in a proximate time window can be identified as a 
factor that militates toward establishing candidate times at, 
or proximate to, the beginning and end of such a time window. 

20 The appearance of text (other than closed-captioning) in a 
set of audiovisual content often accompanies the beginning of 
a commercial break and the disappearance of text often 
accompanies the end of a commercial break, so the appearance 
or disappearance in a set of audiovisual content of text 

25 other than closed-captioning can be identified as a factor 
that militates toward establishing a candidate time at, or 
proximate to, a time at which text appears or disappears. If 
closed-captioning data is present in the data representing 
the audiovisual content, a closed-captioning scrolling format 

30 change often occurs at the beginning or the end of a 

commercial break, so the occurrence of a closed-captioning 
scrolling format change can be identified as a factor that 
militates toward establishing a candidate time at, or 
proximate to, the time at which such a format change occurs. 

35 If closed-captioning data is present in the data representing 
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the audiovisual content, the disappearance of 
closed-captioning often accompanies the beginning of a 
commercial break and the appearance of closed-captioning 
often accompanies the end of a commercial break, so the 
5 appearance or disappearance of closed-captioning can be 
identified as a factor that militates toward establishing a 
candidate time at, or proximate to, a time at which closed- 
captioning appears or disappears. 

As indicated above, it is an advantageous aspect of the 
10 invention that the invention enables use of a combination of 
the cues to effect commercial detection. In particular, the 
invention can enable the use of cues and combinations of cues 
that have not previously been used for commercial detection. 
For example, the invention can advantageously enable any one 
15 of detection of the absence of a network icon, an analysis of 
musical content present in a set of audiovisual content, the 
density of scene breaks and/or fades, or analysis of the 
identity of speakers of spoken content to be used alone as a 
commercial detection cue. These cues can also be used in any 
20 combination with each other or any other cue. In particular, 
it is anticipated that one or more these cues can 
advantageously be used in combination with one or more of the 
following cues: 1) the occurrence of an audio pause, 2) the 
occurrence of a sequence of black frames, 3) a scene cut or 
25 fade, 4) the occurrence of specified closed-captioning 

formatting signals, and 5) the appearance or disappearance of 
closed-captioning. 

Step 205 outputs a list of candidate times at which 
commercials may be beginning or ending, together with a score 
3 0 or probability associated with each candidate time. In one 
implementation of the invention, each candidate time is 
assigned the same initial score. Alternatively, the scores 
assigned to candidate times can vary. For example, the score 
for a candidate time can depend on which cue(s) were used to 
3 5 identify the candidate time. The beginning or end of a 
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commercial can be deduced from the presence of some cues with 
a greater degree of confidence than that associated with the 
presence of other cues. To the extent that a candidate time 
is identified based on a cue with which a relatively high 
5 degree of predictive confidence is associated, the score 

assigned to that candidate time can be relatively higher than 
would be the case if a relatively low degree of predictive 
confidence was associated with the cue. Additionally, the 
score for each candidate time can be dependent on how 

10 strongly the cue is present in the audiovisual content, as 
determined in accordance with a criterion or criteria 
appropriate for that cue: the more strongly a cue is 
present, the higher the score. For example, when one of the 
cues used to establish a candidate time is an audio pause, 

15 the score established for the candidate time can be dependent 
on the duration of the audio pause and/or the degree of 
silence during the audio pause (e.g., the score for the 
candidate time is made relatively greater the longer the 
audio pause or the less sound that is present during the 

20 audio pause) . Or, for example, when one of the cues used to 
establish a candidate time is a sequence of black frames, the 
score established for the candidate time can be dependent on 
the duration of the sequence of black frames and/or the 
completeness of the blackness of the frames (e.g., the score 

25 for the candidate time is made relatively greater the longer 
or blacker the sequence of black frames). Or, for example, 
when one of the cues used to establish a candidate time is a 
scene cut, the score established for the candidate time can 
be dependent on the number of pixels that changed by more 

30 than a threshold amount from one frame to another (e.g., the 
score for the candidate time is made relatively greater as 
more pixels changed between scenes) and/or dependent on the 
total change of all the pixels from one frame to another 
(where the "change" for each pixel is the change in the color 

3 5 or other components of a pixel) . Or, for example, when one 
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of the cues used to establish a candidate time is a 
significant average audio volume change, the score 
established for the candidate time can be dependent on degree 
of the volume change (e.g., the score for the candidate time 
5 is made relatively greater as degree of the volume change 
increases) . Those skilled in the art can readily appreciate 
how the score for a candidate time can be adjusted based on 
aspects of other cues present in the audiovisual content 
proximate to the candidate time. Additionally, the score for 

10 a candidate time can be dependent on the confidence level 
associated with identification of the cue in the audiovisual 
content: the greater the confidence level, the higher the 
score. (This confidence level is different than the 
confidence level associated with the predictive capability of 

15 the cue, discussed above.) For example, sound represented in 
audio data may be sound in the audio content or noise. The 
score for a candidate time identified at least in part based 
on the presence of an audio pause can be increased or 
decreased in accordance with extent to which the degree of 

20 noise present in the audio data increases or decreases the 
confidence with which an audio pause can be detected* 

In step 206 of the method 200, the scores associated 
with each candidate time can be adjusted based on the 
presence or absence of one or more cues within some time 

25 window proximate to the candidate time. The cue(s) used to 
adjust the score of a candidate time in step 206 are 
different than the cue(s) used to establish the candidate 
time and an initial associated score in step 20 5. The 
duration of the time window and location of the time window 

30 with respect to the cue is dependent on the type of cue. For 
instance, the score for a candidate time is increased (i.e., 
the likelihood that the candidate time correctly indicates 
the beginning or ending of a commercial is deemed to 
increase) in each of the following cases: 1) the candidate 

35 time is coincident with the time at which an audio pause 
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(which is a window of audio silence or near silence) 
occurs, 2) the candidate time is within or sufficiently 
proximate to a time window in which the closed-captioning 
scrolling format is different from that which is typical for 
5 audiovisual content of this type, 3) the candidate time is 
within or sufficiently proximate to a time window during 
which closed-captioning is absent (for audiovisual content 
that is known to be closed-captioned) , 4) the candidate time 
is within or sufficiently proximate to a time window of at 
10 least a specified duration (e.g., 60 seconds) and including 
high musical content, 5) the candidate time is within or 
sufficiently proximate to a time window during which the 
density of scene breaks and/or scene fades exceeds a 
specified threshold, 6) the candidate time is sufficiently 
15 proximate to a time window of at least a specified duration 
(e.g., 0.5 seconds) and in" which the average motion in the 
visual content, measured in a specified manner, is less than 
a specified threshold, 7) the candidate time is within a time 
window during which a network icon (which has been found to 
20 be persistent through a majority of the visual content) is 
not present at a specified location within the visual content 
(e.g., a region, such as a corner, near the edge of the 
visual content), 8) the candidate time is very near (e.g., 
within about 2 seconds) a time at which the time -averaged 
25 audio volume (averaged over a time window of about 10 
seconds) has changed by a magnitude of greater than a 
specified threshold, 9) the candidate time is sufficiently 
proximate to (within about 1 second) a time when text is 
present in the visual content, 10) the candidate time is 
30 within a specified duration of time (e.g., a few seconds) 

after the presence in the closed-captioning stream of certain 
keywords or phrases such as "commercial", "break", "coming 
up" or "after", or within a specified duration of time (e.g., 
a few seconds) prior to the presence in the closed-captioning 
35 stream of certain keywords or phrases such as "welcome", 
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"hello" or "we're back, 11) the candidate time is within a 
specified duration of time (e.g., 2 seconds) from a time at 
which the speaker identity has changed, and 12) the candidate 
time is within a specified duration of time (e.g., one to 
5 several seconds) from a time window of greater than a 
specified duration (e.g., 1 minute) that does not include 
speech from a speaker whose speech has been determined to be 
present in the audiovisual content with greater than a 
specified frequency. The amount by which a score is adjusted 

10 can be dependent on the same types of analyses done to 

establish an initial score for a candidate time, as described 
above with respect to step 205. (However, the particular 
analyses done in step 20 6 need not, but can be, the same as 
those done in step 205.) In particular, the amount of the 

15 adjustment to a score for a candidate time can be dependent 
on how strongly the cue is present in the audiovisual 
content, as determined in accordance with a criterion or 
criteria appropriate for that cue: in general, the more 
strongly a cue is present, the greater the adjustment to the 

2 0 score. Additionally, the amount of the adjustment to a score 

for a candidate time can be dependent on how high or low the 
score is prior to the adjustment. For example, a cue that 
strongly indicates the presence of a commercial beginning or 
ending may cause a larger adjustment in a relatively low 
25 score than in a relatively high score. The particular 

quantities, keywords, and other algorithm parameters given 
above are illustrative; they may be changed, within 
appropriate constraints, as can be appreciated by those 
skilled in the art, without adversely affecting the operation 

3 0 of the invention. 

In step 207 of the method 200, the scores associated 
with each candidate time are further adjusted based on one or 
more probability models of characteristic (s) of the 
occurrence of commercials and/or commercial breaks within 
3 5 audiovisual content. For example, the scores of the 
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candidate times can be adjusted based on a probability model 
of the time -separation of commercial start and end times. 
The scores of the candidate times can also be adjusted based 
on a probability model of the typical locations of commercial 

5 breaks within audiovisual content. Such a probability model 
can be constructed by collecting statistics regarding the 
relevant characteristic (s) across many sets of audiovisual 
content of a variety of different types, in order to produce 
a generic probability model that applies to all types of 

10 audiovisual content. Such a probability model can also be 
made specific to a particular type or types of audiovisual 
content (including a particular audiovisual program) by only 
combining statistics across audiovisual content of those 
type(s). This can be desirable to increase the accuracy 

15 obtained when the probability model is used to aid in 
detection of commercials in audiovisual content of those 
type(s). Finally, such a probability model can be 
constructed manually based on the intuition of the 
implementer of the model as to the characteristic being 

20 modeled (e.g., how long commercials typically last or when 
commercial breaks tend to occur in given audiovisual 
content). However constructed, the probability model (s) can 
be represented as functions of time, as described below. 
In step 207, one or more probability models can be 

25 applied to the list of score-adjusted candidate times 

generated by step 206 to further adjust the scores of the 
candidate times. The list of score-adjusted candidate times 
generated by step 206 is first represented as a function of 
time, P(t), which indicates the likelihood that a commercial 

3 0 starts or ends at each time during the audiovisual content. 
For all candidate times, P(t) can be made equal to the 
adjusted score associated with that candidate time (perhaps 
normalized by the total of all the adjusted scores) , while 
for all other times P(t) can be made equal to zero. 

35 Figure 3A is a graph of an example of a function P(t) . 
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A function S(t) is determined, representing a 
probability model of the likely location, relative to a 
particular commercial start or end time, of other commercial 
start or end times. The function S(t) can be particularized 
5 to be representative of a particular type or types of 

audiovisual content* For example, for American television 
programs, the function S(t) will have peaks around 
+/- 30 seconds, +/- 60 seconds and +/- 15 seconds. Figure 3B 
is a graph of an example of a function S(t) . 
10 P(t) is convolved with S(t) to produce a function C(t). 

The value of C(t) will be very high at times at which a high 
value of P(t) is separated from other times having a high 
value of P(t) by time durations S(t) having a high value. In 
other words, C(t) will be very large at values of t which are 
15 high- scoring candidate times and where there are other 
high- scoring candidate times before and/or after t by an 
amount of time corresponding to a typical commercial duration 
(e.g., 30 seconds). Figure 3C is a graph of a function C(t) 
produced by convolving the function P(t) of Figure 3A with 
20 the function S(t) of Figure 3B. 

A function R(t) is determined, representing a 
probability model of the likelihood, at all times within a 
set of audiovisual content, that a commercial is in progress. 
If the probability model is restricted to be based on a 
25 particular class of well-defined sets of audiovisual content 
(e.g., the different versions of a particular recurring 
audiovisual program) and if commercials tend to be placed at 
approximately the same times in each such set of audiovisual 
content, then the probability model will have well-defined 
3 0 zones during which the probability of a commercial being in 
progress is high. On the other hand, if the audiovisual 
content on which the probability model is based is not 
restricted at all, so that the probability model is learned 
across all types of audiovisual content, the probability 
35 model will likely be close to a uniform distribution (a flat 
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function) , which is not very useful . Therefore, it is 
desirable to restrict the audiovisual content on which the 
probability model is based. In particular, it is desirable 
to base the probability model which the function R(t) 
5 represents on audiovisual content which is similar to that in 
which commercials are to be detected. Figure 3D is a graph 
of an example of a function R(t) . 

As indicated above, step 207 begins by convolving the 
function P(t) with the function S(t) to produce the function 

10 C(t). The function C(t) is then point-wise multiplied by the 
function R(t) to produce a function P' (t) . Figure 3E is a 
graph of a function P f (t) produced by point-wise multiplying 
the function C(t) of Figure 3C by the function R(t) of 
Figure 3D. The function P'(t) is resampled at the candidate 

15 times: these samples represent further adjusted scores for 
the candidate times. 

The method 200 is described above as including both 
score adjustments of the type described in step 20 6 and score 
adjustments of the type described in step 207. However, the 

2 0 method 20 0 could be implemented with only one of those types 
of score adjustments, i.e., the method 200 could include only 
one of steps 206 and 207. 

After the adjustment of scores in step 207, optionally, 
scores below a specified threshold can be eliminated. 

25 In step 208 of the method 200, the candidate times and 

associated adjusted scores are evaluated, and starting and 
ending times for commercial breaks and individual commercials 
within those commercial breaks are identified based on that 
evaluation. Two additional probability models (which can be 

30 represented as functions of time) are used in this 
evaluation: 1) a function L(t), which represents a 
probability model of the typical duration of a commercial 
break, and 2) a function W(t) , which represents a probability 
model of the typical time separation between commercial 
35 breaks. Each of these probability models, as the probability 
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models discussed above with respect to step 207, can be 
constructed based on statistics collected across many types 
of audiovisual content or across only a particular type or 
types of audiovisual content, or can be constructed based on 
5 the intuition of the implementer of the model regarding the 
characteristic being modeled. Figure 3F is a graph of an 
example of a function L(t) and Figure 3G is a graph of an 
example of a function W(t) . 

Step 20 8 begins by selecting the candidate time with the 

10 highest score to be a commercial start or end time (whether 
that time is a start time or end time is unknown at this 
point) . A commercial break is then constructed based on the 
selected candidate time by successively evaluating candidate 
times in order of decreasing score and adding candidate times 

15 to the commercial break that satisfy each of the following 
criteria: 1) the additional candidate time is well -spaced in 
time, in accordance with the function S(t), from each 
candidate time that has already been included in the 
commercial break, 2) the additional candidate time does not 

20 create a commercial break which is too long, in accordance 
with the function L(t), and 3) the additional candidate time 
is not too close to other existing commercial breaks, in 
accordance with the function W(t), that have already been 
identified by the step 208. Stated another way, candidate 

25 times continue to be added to a commercial break in order of 
score as long as there are any candidate times for which all 
of the following are true: 1) the value of S(t), where "t" 
is the time separation between the candidate time being 
evaluated and a candidate time already included in the 

30 commercial break, is above a specified threshold value for 
each candidate time already included in the commercial 
break, 2) the value of L(t), where "t" is the duration of the 
commercial break if the candidate time is added to the 
commercial break, is not below a specified threshold value, 

35 and 3) the value of W(t) , where "t" is the time separation 
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between the candidate time and an existing commercial break, 
is not below a specified threshold value for each existing 
commercial break. 

Once no more candidate times can be added to the 
5 commercial break currently being constructed, step 20 8 
attempts to find a new candidate time around which to 
construct a new commercial break. The candidate time with 
the highest score above a specified threshold that is not 
currently part of any commercial break, and that is separated 
10 from all existing commercial breaks by a time "t" for which 
W(t) is above a specified threshold, is selected as a new 
candidate time upon which to base the construction of a new 
commercial break. If such a candidate time is identified, 
then the construction of a commercial break proceeds as 
15 described above. If no such candidate time can be 

identified, then step 208 terminates. The output of step 208 
is a list of commercial break start and end times, and lists 
of start times of individual commercials within each 
commercial break. 
20 Finally, step 209 of the method 200 causes the set of 

decisions made in step 20 8 regarding start and end times of 
commercial breaks and commercials to be iteratively refined 
and optimized. Step 209 attempts to account for other 
high-scoring candidate times which may occur in or near a 
25 previously identified commercial break, but which have not 
yet been chosen as a commercial start or end time. In 
step 209, the following procedure is performed for each 
candidate time which has not yet been selected as a 
commercial start or end time and for which P' (t) has a 
30 magnitude above a specified threshold value. 

First, the candidate time is added to the most 
temporally proximate commercial break, provided that the 
candidate time would not cause that commercial break to 
become too long (i.e., the value of L(t), where "t" is the 
3 5 duration of the commercial break if the candidate time is 
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added to the commercial break, is not below a specified 
threshold value) and would not cause that commercial break to 
be too close to another commercial break (i.e., the value of 
W(t) , where "t" is the time separation between the candidate 
5 time and an existing commercial break, is not below a 
specified threshold value for each existing commercial 
break) . 

Next, additional candidate times that are not part of 
the commercial break to which a candidate time has just been 

10 added are evaluated according to the same criteria, described 
above, that were used to construct the commercial break in 
the first place. That is, the probability models S(t), L(t), 
and W(t) are all considered as described above. 

The new commercial break may have too many candidate 

15 times within it. Thus, an attempt is made to remove 

candidate times that are located too closely together. For 
each candidate time of the commercial break, a computation is 
made of the average S(t) for the time separation between the 
candidate time and the two adjacent candidate times. If the 

20 average is below a specified threshold, then it may be that 
either the candidate time or the adjacent candidate times are 
not accurate commercial start or end times. If the P' (t) 
score of the candidate time is lower than the average P' (t) 
of the two adjacent candidate times, the candidate time is 

25 eliminated from the commercial break. Otherwise, any of the 
two adjacent candidate times that are within a specified 
small time separation (e.g., 25 seconds) of the candidate 
time are eliminated from the commercial break. 

Finally, after the addition (s) and elimination (s) of 

3 0 candidate times have been made, a new average score is 

calculated for the candidate times of the commercial break. 
If the average score is higher than the average score for the 
candidate times of the commercial break before changes were 
made in step 209, the changes are kept. Otherwise, the 

3 5 changes are discarded and the candidate times of the 
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commercial break revert back to the candidate times before 
the addition (s) and elimination (s) made in step 209. 

After an attempt has been made to include each of the 
candidate times for which P' (t) is above a threshold, a check 
5 is made to see if any of these candidate times were in fact 
included in a commercial break. If none were, then step 209 
terminates and outputs the final determination (s) of the 
method 200 regarding the start and end times of commercial 
breaks and individual commercials in the audiovisual content. 

10 If any of these candidate times were included in a commercial 
break, then step 209 is repeated: for each candidate time 
having a score P' (t) above a specified threshold, an attempt 
is made to add that candidate time to the nearest commercial 
break. Many iterations of step 209 may be required before a 

15 stable solution is produced (i.e., further changes are not 
made to the determined start and end times of commercial 
breaks and individual commercials in the audiovisual 
content) . 

The invention as described above can be modified 

20 slightly in order to operate on audiovisual content that is 
being received live via a tuner, cable or other means, and to 
produce commercial detection results with minimal delay 
relative to the current live position within the audiovisual 
content. If the cue(s) are not transmitted with the data 

25 representing the audiovisual content, to accomplish this, the 
invention is implemented to enable the analysis of the 
audiovisual content to identify the presence of cues (the 
invention can be implemented to identify one, some or all of 
the cues described above with respect to step 204 of the 

30 method 200 of Figure 2) at a rate that is at least as fast as 
the rate at which the data representing the audiovisual 
content is being received. The invention is further 
implemented s.o that this can be done while receiving the 
transmission of the data representing the audiovisual 

3 5 content. The invention is implemented to evaluate lists of 
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identified cues over a window from the present time "t" back 
to a time "T" seconds into the past. If the computations 
done in steps 205 through 209 of the method 200 of Figure 2 
(or comparable steps of another method according to the 

5 invention, such as the steps 102 through 104 of the 

method 100 of Figure 1) can be done in N seconds, a new set 
of commercial location estimates up to the current time "t" 
minus approximately N seconds can be produced. This can be 
done every N seconds. For instance, if N = 0.5 seconds, the 

10 commercial locations up to the current time with 0.5 seconds 
delay can be computed and this can be done every 0.5 seconds. 
This would be sufficiently fast for commercial detection 
applications that require actions to be taken in roughly 
"real-time," e.g., changing the channel or stopping recording 

15 when a commercial begins. The accuracy of such an embodiment 
of the invention may not be as high as for an embodiment that 
operates on audiovisual content that is stored in its 
entirety, since, in the latter case, information beyond time 
"t" can be used to improve the commercial detection decision 

20 at time "t" . 

The invention can be implemented entirely in software, 
entirely in hardware (e.g., using DSPs and/or special purpose 
ASICs) or in a combination of the two. Firmware can also be 
used to implement some part or all of the invention. 

25 The invention can be used for a wide variety of 

applications, as can be appreciated by those skilled in the 
art in view of the description herein. In general, the 
invention can be used with any broadcast or other data 
transmission over a network (e.g., conventional network 

30 television broadcasts, cable television broadcasts, 

broadcasts or transmissions over a computer network such as 
the Internet - and, in particular, the World Wide Web portion 
of the Internet) . Additionally, the invention can be used 
generally to detect commercials in audiovisual content 

35 represented by any type of data, which data can be stored on 
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a data storage medium or media, or provided to a system or 
method according to the invention in real time. Further, the 
invention can be implemented in a wide variety of apparatus, 
as can also be appreciated by those skilled in the art in 
5 view of the description herein, such as, for example, 

television set-top boxes, digital VCRs , computers (including 
desktop, portable or handheld computers) or any of a variety 
of other computational devices (including many which are now, 
or will in the future be, developed) . 

10 Various embodiments of the invention have been 

described. The descriptions are intended to be illustrative, 
not limitative. Thus, it will be apparent to one skilled in 
the art that certain modifications may be made to the 
invention as described herein without departing from the 

15 scope of the claims set out below. 
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I claim: 

1. A method for detecting one or more commercial 
breaks in a set of audiovisual content, each commercial break 
including one or more commercials, the method comprising the 

5 steps of: 

identifying candidate times within the set of 
audiovisual content based on an evaluation of one or 
more cues identified in the audiovisual content, each 
candidate time representing a possible starting time or 
10 a possible ending time of a commercial; 

assigning a score to each candidate time; 

evaluating, for each of one or more candidate 
times, 1) one or more cues other than the one or more 
cues used to identify the candidate time, and/or 2) the 
15 relationship between the candidate time and one or more 

other candidate times, wherein the score assigned to the 
candidate time can be adjusted based on the evaluation; 
and 

constructing the one or more commercial breaks 
20 based on an evaluation of 1) the scores of the candidate 

times after the step of evaluating and 2) the 
relationship among the candidate times. 

2. A method as in Claim 1, further comprising the step 
of identifying the presence of one or more cues in the 

25 audiovisual content. 

3. A method as in Claim 2, wherein the step of 
identifying the presence of one or more cues in the 
audiovisual content further comprises evaluating the 
audiovisual content to identify the presence of one or more 

-30 cues regarding one or more of the following possible 

characteristics of the audiovisual content: 1) an audio 
pause in the audio content, 2) a sequence of black frames in 
the visual content, 3) a scene cut or fade in the visual 
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content, 4) a significant change in average volume in the 
audio content, 5) the presence of music in the audio 
content, 6) speaker identity, 7) the density of scene breaks 
or fades in the visual content, 8) the absence of a usually 
5 present network icon, 9) the degree of motion in a period of 
visual content, 10) the presence of text in the visual 
content, 11) the occurrence of specified closed-captioning 
formatting signals and 12) the absence of closed-captioning. 

4. A method as in Claim 2, wherein the step of 
10 identifying the presence of one or more cues in the 

audiovisual content further comprises evaluating the 
audiovisual content to identify the presence of a cue 
regarding the" absence of a usually present network icon. 

5. A method as in Claim 2, wherein the step of 
15 identifying the presence of one or more cues in the 

audiovisual content further comprises evaluating the 
audiovisual content to identify the presence of a cue 
regarding the presence of music in the audio content. 

6. A method as in Claim 2, wherein the step of 
2 0 identifying the presence of one or more cues in the 

audiovisual content further comprises evaluating the 
audiovisual content to identify the presence of a cue 
regarding the density of scene breaks or fades in the visual 
content. 

25 7. A method as in Claim 2, wherein the step of 

identifying the presence of one or more cues in the 
audiovisual content further comprises evaluating the 
audiovisual content to identify the presence of a cue 
regarding speaker identity. 
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8. A system for detecting one or more commercial 
breaks in a set of audiovisual content, each commercial break 
including one or more commercials, the system comprising: 

means for identifying the presence of one or more 
5 cues in the audiovisual content; 

means for identifying candidate times within the 
set of audiovisual content based on an evaluation of one 
or more cues identified in the audiovisual content, each 
candidate time representing a possible starting time or 
10 a possible ending time of a commercial; 

means for assigning a score to each candidate time; 
means for evaluating, for each of one or more 
candidate times, 1) one or more cues other than the one 
or more cues used to identify the candidate time, 
15 and/or 2) the relationship between the candidate time 

and one or more other candidate times, wherein the score 
assigned to the candidate time can be adjusted based on 
the evaluation; and 

means for constructing the one or more commercial 
20 breaks based on an evaluation of 1) the scores of the 

candidate times after the step of evaluating and 2) the 
relationship among the candidate times. 

9. A computer readable storage medium or media encoded 
with one or more computer programs including instructions for 

25 detecting one or more commercial breaks in a set of 

audiovisual content, each commercial break including one or 
more commercials, the one or more computer programs 
comprising: 

instructions for identifying the presence of one or 
30 more cues in the audiovisual content; 

instructions for identifying candidate times within 
the set of audiovisual content based on an evaluation of 
one or more cues identified in the audiovisual content, 
each candidate time representing a possible starting 
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time or a possible ending time of a commercial; 

instructions for assigning a score to each 
candidate time; 

instructions for evaluating, for each of one or 
5 more candidate times, 1) one or more cues other than the 

one or more cues used to identify the candidate time, 
and/or 2) the relationship between the candidate time 
and one or more other candidate times, wherein the score 
assigned to the candidate time can be adjusted based on 
10 the evaluation; and 

instructions for constructing the one or more 
commercial breaks based on an evaluation of 1) the 
scores of the candidate times after the step of 
evaluating and 2) the relationship among the candidate 
15 times. 

10. A method for detecting one or more commercial 
breaks in a set of audiovisual content, each commercial break 
including one or more commercials, the method comprising the 
steps of: 

20 identifying candidate times within the set of 

audiovisual content based on an evaluation of one or 
more cues identified in the audiovisual content, each 
candidate time representing a possible starting time or 
a possible ending time of a commercial; 

25 assigning a score to each candidate time; and 

constructing the one or more commercial breaks 
based on an evaluation of 1) the scores of the candidate 
times after the step of evaluating and 2) the 
relationship among the candidate times. 

30 11. A method for detecting one or more commercial 

breaks in a set of audiovisual content, each commercial break 
including one or more commercials, wherein candidate times, 
each candidate time representing a possible starting time or 
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a possible ending time of a commercial within the set of 
audiovisual content, have been identified based on an 
evaluation of one or more cues identified in the audiovisual 
content and a score assigned to each candidate time, the 
5 method comprising the steps of: 

evaluating, for each of one or more candidate 
times, 1) one or more cues other than the one or more 
cues used to identify the candidate time, and/or 2) the 
relationship between the candidate time and one or more 
10 other candidate times, wherein the score assigned to the 

candidate time can be adjusted based on the evaluation; 
and 

constructing the one or more commercial breaks 
based on, an evaluation of 1) the scores of the candidate 
15 times after the step of evaluating and 2) the 

relationship among the candidate times. 
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ITERATIVE, MAXIMALLY PROBABLE, BATCH-MODE 
COMMERCIAL DETECTION FOR AUDIOVISUAL CONTENT 
Michael L. Harville 

ABSTRACT 

5 The invention identifies starting and ending times of 

commercial breaks, as well as starting and ending times of 
commercials within those commercial breaks, can be found in 
audiovisual content (e.g., a television broadcast) using a 
method having characteristics which overcome disadvantages of 

10 previous commercial detection approaches. The invention is 
implemented as a solution to a "batch optimization" problem 
in which commercial locations within a set of audiovisual 
content are detected as a group by choosing a set of 
commercial locations which optimizes a cost function which 

15 can include consideration of, for example, 1) one or more of 
many types of visual recording, audio recording and/or 
closed-captioning cues, 2) relative locations of commercials 
within the audiovisual content, and/or 3) probability models 
based on statistics obtained regarding characteristics of 

20 typical commercial and commercial breaks (e.g., commercial 
and commercial break duration, separation times of 
commercials and commercial breaks, likelihood of the presence 
of a commercial at any given time in a set of audiovisual 
content) . Optimization can be done over the total set of 

25 commercial location decisions, rather than on a 

per-commercial basis. Additionally, the cost function can be 
iteratively evaluated, increasing the accuracy of commercial 
location decisions produced by the method. Additionally, 
many more types of cues and combinations of cues can be used 

3 0 in detection of commercials in accordance with the invention 
than have been used in other approaches to commercial 
detection. 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 
As a below-named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below 
adjacent to my name. 

I believe I am the original, first and sole inventor (if only one name is 
listed below) or an original, first and joint inventor (if plural names are 
listed below) of subject matter (process, machine, manufacture, or 
composition of matter, or an improvement thereof) which is claimed and for 
which a patent is sought by way of the application entitled Iterative , 
Maximally Probable, Batch-Mode Commercial Detection for Audiovisual 
Content , 

which (check) [X] is attached hereto [ ] and is amended by the 

Preliminary Amendment attached hereto. 

t ] was filed on as Application Serial 

No. [ ] and was amended on . 

I hereby state that I have reviewed and understood the contents of the 
above -identified application, including the claims, as amended by any 
aiibndment referred to above. 

^Acknowledge the duty to disclose to the United States Patent and 
Trademark Office information known to me to be material to the examination 
dfi this application in accordance with Title 37, Code of Federal 
Regulations , § 1.56(a). 

IJMhereby claim the priority benefit under Title 35, United States Code, 
& 119 of any foreign application (s) for patent or inventor's certificate 
Ijisted below and have also identified below any foreign application for 
Effttent or inventor's certificate for the same invention having a filing 
date before that of the application on which priority is claimed: 
m 

iSior Foreign Application (s) Priority Claimed? 

W/A . 

(Number) (Country) (Date Filed) Yes No 



(Number) (Country) (Date Filed) Yes No 

I hereby claim the priority benefit under Title 35, United States Code, 
§§ 119 and 365(a) of any international patent application (s) , listed below, 
that do not designate the United States, but do designate at least one 
country other than the United States, and have also identified below any 
such international application for the same invention having a filing date 
before that of the application on which priority is claimed: 

Prior International Application (s) Priority Claimed? 

N/A 

(Number) (Date Filed) Yes No 



(Number) (Date Filed) Yes No 
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I hereby claim the priority benefit under Title 35, United States Code, 
§ 119(e) of the United States provisional patent application (s) listed 
below and, insofar as any subject matter of the claims of this application 
is not disclosed in such prior United States provisional application (s) in 
the manner provided by the first paragraph of Title 35, United States Code, 
§ 112, I acknowledge the duty to disclose material information as defined 
in Title 37, Code of Federal Regulations, § 1.56(a) which became available 
between the filing date of the prior provisional application (s) and the 
national or PCT international filing date of this application: 

60/166, 528 November 18. 1999 Pending 

(Appl. Ser. No.) (Date Filed) (Status-patented, pending, abandoned) 



(Appl. Ser. No.) (Date Filed) (Status-patented, pending, abandoned) 

I hereby claim the priority benefit under Title 35, United States Code, 
§ 120 of the United States patent application (s) listed below and, insofar 
as any subject matter of the claims of this application is not disclosed in 
such prior United States application (s) in the manner provided by the first 
paragraph of Title 35, United States Code, § 112, I acknowledge the duty to 
disclose material information as defined in Title 37, Code of Federal 
Regulations, § 1.56(a) which became available between the filing date of 
tife prior application (s) and the national or PCT international filing date 
6% this application: 

M/A — 

frfippl. Ser. No.) (Date Filed) (Status-patented, pending, abandoned) 



(Appl. Ser. No.) (Date Filed) (Status-patented, pending, abandoned) 

ijlhereby claim the priority benefit under Title 35, United States Code, 
i| 120 and 365(c) of any international patent application (s) , listed below, 
mat designate the United States and have also identified below any such 
international application for the same invention having a filing date 
before that of the application (s) on which priority is claimed, and, 
insofar as any subject matter of the claims of this application is not 
disclosed in such prior international application (s) in the manner provided 
by the first paragraph of Title 35, United States Code, § 112, I 
acknowledge the duty to disclose material information as defined in Title 
37, Code of Federal Regulations, § 1.56(a) which became available between 
the filing date of the prior international application (s) and the national 
or PCT international filing date of this application: 

Prior International Application (s) Priority Claimed? 



N/A 

(Number) (Date Filed) Yes No 



(Number) (Date Filed) Yes No 
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I hereby appoint the following attorney, with full power of substitution, 
to prosecute this application and to transact all business in the United 
States Patent and Trademark Office connected therewith: David R. Graham, 
Reg. No. 36,150. 

Please address all correspondence regarding this application to David R. 
Graham, 13 37 Chewpon Avenue, Milpitas, California 95035. 

Please direct all telephone calls regarding this application to David R. 
Graham at telephone number (408) 945-9912. 

I hereby declare that all statements made herein of my own knowledge are 
true and that all statements made herein on information and belief are 
believed to be true; and further that these statements were made with the 
knowledge that willful false statements and the like so made are punishable 
by fine or imprisonment, or both, under Title 18, United States Code, 
§ 1001 and that such willful false statements may jeopardize the validity 
of the application or any patent issued thereon. 



Inventor's signature Date 

Full name of inventor Michael L. Harville 



RQsidence Palo Alto, California Citizenship _U.S. 

Pgst Office Address P.O. Box 60181 



Palo Alto, California 94306 



I§ventor f s signature Date 

qiill name of inventor 



F|isidence Citizenship 

Post Office Address _ 



Inventor's signature Date 

ESll name of inventor 



residence Citizenship 

iMst Office Address 



Inventor's signature Date 

Full name of inventor 



Residence Citizenship 

Post Office Address 



Inventor's signature Date 

Full name of inventor 

Residence Citizenship 

Post Office Address _ 
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